Federated Aggregated Search

نویسنده

  • Andrés Marenco Zúñiga
چکیده

Federated Aggregated Search by Andrés Marenco Zúñiga The traditional search engine paradigm has changed from retrieving simple text documents, to selecting a broader combination of diverse document types (i.e. images, videos, maps...) that could satisfy the user’s information need. Each type of document, stored in specialized databases known as ‘verticals’, and found in either local or federated locations, is nowadays integrated into ‘aggregated search engines’. Due to this domain coverage of each vertical, when a query enters the system, only the ones which are most likely to contain the desired information should be selected. To perform this selection, a text representation of each vertical is created by directly sampling a set of documents from the vertical’s search engine. However, many times the vertical representation is not descriptive enough. Reasons such as the heterogeneous nature of the documents or the lack of cooperation of the vertical could negatively affect the generation of the representation. Thus, we focus on the problem of creating an aggregated search engine which integrates federated collections in an uncooperative environment. With the help of Wikipedia as a complementary external source of information, we investigate the use of three techniques found in the literature aimed to enrich the vertical representation: a) using only Wikipedia articles as representation; b) using a combination of Wikipedia articles and the sample obtained from the vertical; and c) expanding the contents of each sampled document. We discovered how by applying latent Dirchlet allocation to model the hidden topics of documents directly sampled from each vertical it is possible to identify Wikipedia articles with the same theme coverage as the vertical. Then, we demonstrate how by using only Wikipedia articles for representation of some particular verticals, the selection task is improved. As a second point, we explored the use of the modeled topics together with Wikipedia categories to boost the score of the verticals that could be associated with the query string. Although in this case our results are inconclusive, the experiments suggest that by applying query classification and then matching obtained categories with the verticals’ categories it is possible to increase the effectiveness of the vertical selection task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open Archive Toulouse Archive Ouverte (oatao) Aggregated Search: a New Information Retrieval Paradigm

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents, and assemble information that fulfill his/her information need. Aggregated search represents a new class of approac...

متن کامل

Approaches to implement and evaluate aggregated search

Aggregated search or aggregated retrieval can be seen as a third paradigm for information retrieval following the boolean retrieval paradigm and the ranked retrieval paradigm. In the first two, we are returned respectively sets and ranked lists of search results. It is up to the time-poor user to scroll this set/list, scan within different documents and assemble his/her information need. Altern...

متن کامل

Federated semantic search using terminological thesauri for learning object discovery

Purpose – The purpose of this paper is to propose a framework and system to address the inability to discover new and authentic learning material and the lack of a single access point for search and browsing of remote learning object repositories (LORs). Design/methodology/approach – The authors develop a framework for keyword-based query expansion using SKOS domain terminologies and implement ...

متن کامل

Learning Resource Referencing, Search and Aggregation at the eLearning System Level

TELOS is a new eLearning system being built within the Canadian LORNET project. TELOS aims to provide an open operating system in which users can develop and use eLearning and knowledge management resources and environments within a service-oriented and ontology-driven framework. A special emphasis is put on the aggregation of resources through a graphic scenario editor and the referencing of t...

متن کامل

Preserving the owner’s autonomy in networks of patient registries and biobanks

Background To achieve statistical significance in rare disease research, bioor data samples taken from one patient registry or biobank may need to be complemented by those of other institutions [1,2]. While a first overview of potential research partners can be obtained using public catalogues as established by BBMRI [3] or Orphanet [4], this article focuses on mediation services, which provide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014